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Abstract 

The conflation of a finite number of probability distributions Pi , . . . , P„ is a consol- 
idation of those distributions into a single probability distribution Q = Q{Pi, . . . , Pn), 
where intuitively Q is the conditional distribution of independent random variables 
Xi, . . . , Xn with distributions Pi, . . . , Pn, respectively, given that Xi = ■ ■ ■ = Xn- 
Thus, in large classes of distributions the conflation is the distribution determined by 
the normalized product of the probability density or probability mass functions. Q is 
shown to be the unique probability distribution that minimizes the loss of Shannon 
Information in consolidating the combined information from Pi , . . . , P„ into a single 
distribution Q, and also to be the optimal consolidation of the distributions with re- 
spect to two minimax likelihood-ratio criteria. When Pi,...,P„ are Gaussian, Q is 
Gaussian with mean the classical weighted-mean-squares reciprocal of variances. A 
version of the classical convolution theorem holds for conflations of a large class of a.c. 
measures. 
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1 Introduction 

Conflation is a method for consolidating a finite number of probability distributions Pi, . . . , P„ 
into a single probability distribution Q = Q{Pi, . . . , P„). The study of this method was mo- 
tivated by a basic problem in science, namely, how best to consolidate the information from 
several independent experiments, all designed to measure the same unknown quantity. The 
experiments may differ in time, geographical location, methodology and even in underlying 
theory. Ideally, of course, all experimental data, past as well as present, should be incorpo- 
rated into the scientific record, but the result would be of limited practical application. For 
many purposes, a concise consolidation of those distributions is more useful. 

For example, to obtain the current internationally-recognized values of each of the funda- 
mental physical constants (Planck's constant, Avogadro's number, etc.), the U.S. National 
Institute of Standards and Technology (NIST) collects independent distributional data, often 
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assumed to be Gaussian (see Section 6), from various laboratories. Then, for each funda- 
mental physical constant, NIST combines the relevant input distributions to arrive at a 
recommended value and estimated standard deviation for the constant. Since these recom- 
mended values are usually interpreted as being Gaussian, NIST has effectively combined the 
several input distributions into a single probability distribution. 

The problem of combining probability distributions has been well studied; e.g., (0) de- 
scribes a "plethora of methods" for finding a summary T{Pi, . . . , P„) of n given (subjective) 
probability measures Pi, . . . ,Pn that represent different expert opinions. All those methods, 
however, including the classical convex combination or weighted average (T(Pi, . . . , P„) = 
'^i=i WiPi, with nonnegative weights {wi} satisfying Y17=i '^i ~ ^) ^"^^ various nonlinear 
generalizations, are idempotent, i.e., T{P, . . . ,P) = P. For the purpose of combining prob- 
ability distributions that represent expert opinions, idempotency is a natural requirement, 
since if all the opinions Pi, . . . , P„ agree, the best summary is that distribution. 

But for other objectives for combining distributions, such as consolidating the results of 
independent experiments, idempotency is not always a desirable property. Replications of 
the same underlying distribution by independent laboratories, for example, should perhaps 
best be summarized by a distribution with a smaller variance. In addition to the problem 
of assigning and justifying the unequal weights, another problem with the weighted aver- 
ages consolidation is that even with normally-distributed input data, this method generally 
produces a multimodal distribution, whereas one might desire the consolidated output dis- 
tribution to be of the same general form as that of the input data - normal, or at least 
unimodal. 

Another natural method of consolidating distributional data - one that does preserve 
normality, and is not idempotent - is to average the underlying input data. In this case, the 
consolidation T(Pi, . . . , P„) is the distribution of (Xi -|- ■ ■ ■ -t- Xn)/n (or a weighted average), 
where {Xi} are independent with distributions {Pi}, respectively. With this consolidation 
method, the variance of T(Pi, . . . , P„) is strictly smaller (unless {Xi} are all constant) than 
the maximum variance of the {Pi}, since var(P) = (var(Pi) -!-■•• + var(P„))/n^. Input data 
distributions that differ significantly, however, may sometimes reflect a higher uncertainty or 
variance. More fundamentally, in general this method requires averaging of completely dis- 
similar data, such as results from completely different experimental methods (see Section 6). 

The method for consolidating distributional data presented below, called the conflation 
of distributions, and designated with the symbol "&" to suggest consolidation of Pi and P2, 
does not require ad hoc weights, and the mean and/or variance of the conflation may be 
larger or smaller than the means or variances of the input distributions. In general, con- 
flation automatically gives more weight to input distributions arising from more accurate 
experiments, i.e. distributions with smaller standard deviations. The conflation of several 
distributions has several other properties that may be desirable for certain consolidation ob- 
jectives - conflation minimizes the loss of Shannon information in consolidating the combined 
information from Pi, . . . , P„ into a single distribution Q, and is both the unique minimax 
likelihood ratio consolidation and the unique proportional likelihood ratio consolidation of 
the given input distributions. 

In addition, conflations of normal distributions are always normal, and coincide with the 
classical weighted least squares method, hence yielding best linear unbiased and maximum 
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likelihood estimators. Many of the other important classical families of distributions, in- 
cluding gamma, beta, uniform, exponential, Pareto, LaPlace, Bernoulli, Zeta and geometric 
families, are also preserved under conflation. The conflation of distributions has a natural 
heuristic and practical interpretation - gather data (e.g., from independent laboratories) 
sequentially and simultaneously, and record the values only when the results (nearly) agree. 

2 Basic Definition and Properties of Confiations 

Throughout this article, N will denote the natural numbers, Z the integers, M the real 
numbers, (a, b] the half-open interval {x & W : a < x < b}, M the Borel subsets of M, V the 
set of all real Borel probability measures, the Dirac delta measure in V at the point x 
(i.e., Sx{B) = 1 if X e -B, and = if x ^ B), \\fi\\ the total mass of the Borel sub-probability 
/i, o( ) the standard "little oh" notation o(a„) = 6„ if and only if lim„_»oo = 0, a.c. means 
absolutely continuous, the p.m.f. of P is the probability mass function {p{k) = P{{k})) if P 
is discrete and p.d.f. is the probability density function (Radon-Nikodyn derivative) of P if 
P is a.c, E{X) denotes the expected value of the random variable X, tpp the characteristic 
function of P eV (i.e., ippit) = e^^^dP{x)), Ia is the indicator function of the set A (i.e. 
Ia{x) = 1 if X G a and = if x ^ A), g®h is the convolution {g®h){t) = g(t — s)h{s)ds 
of g and h, and A'^ is the complement M\A of the set A. For brevity, /i((a, b]) will be written 
/i(a,6], /i({x}) as /i(x), etc. 

Definition 2.1. For Pi, . . . ,Pn E V and j E N, fJ,j{Pi, ■ ■ ■ , Pn) is the purely-atomic j-dyadic 
sub-probability measure 

n 

/i,(Pi, . . . , P„) = ^ J] P,((fc - 1)2-^ k2-^]6u2-.. 

keZ i=l 

Remark. The choice of using half-open dyadic intervals closed on the right, and of placing 
the mass in every dyadic interval at the right end point is not at all important — the results 
which follow also hold if other conventions are used, such as decimal or ternary half-open 
intervals closed on the left, with masses placed at the center. 

Example 2.2. If Pi is a Bernoulli distribution with parameter P = | (i-e. P = (H^2±M) a^ci 
P2 is Bernoulli with parameter i, then /Uj(Pi, P2) = ^^^^^^^^'^^ for all j E N. 

The next proposition is the basis for the definition of conflation of general distributions 
below. Recall (e.g. Theorem 4.4.1)) that for real Borel sub-probability measures {uj} 
and z/, the following are equivalent: 

u vaguely as j 00; (2.1a) 
z/(a, b] for all a < 6 in a dense set D cM.; (2.1b) 

j f{x)dv{x) (2.1c) 

for all continuous / that vanish at infinity. 



z/j(a,6] 
lim / f(x)duj{x) 
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Theorem 2.3. For all P, Pi, . . . P^, e V 

(i) /ij+i < yUj (2^, 2^^] /or a// j, m E N, j > m; and all a <b, a,b e Z; 

(ii) . . . , Pn) converges vaguely to a sub-probability measure 

/^oo (^1 ) • • • ; Pn) ; 

(iii) lim^^oo Wl^jiPi, ■ ■ ■ , Pn)\\ = ||/Uoo(^i, • • • and 

(iv) fioo{P) = P , and fij{P) converges vaguely to P as j ^ 00. 

The fohowing simple observation — that the square of the sums of nonnegative numbers 
is always at least as large as the sum of the squares — will be used in the proof of the 
theorem and several times in the sequel, and is recorded here for ease of reference. 

Lemma 2.4. For all n G N, all at^k > 0, and all J C N, Y\a=i Y^kaJ^hk > ZlfceJ Y\a=i ^i,k- 
Proof of Theorem 12.31 For (i), note that for j > m 



El k k + 1 



k=a2:i~"^ 



k k + 1 

2J' 2i 



and 



(k k+l 

1^ Uj' — 



k=a2i-"' 



23 ' 2^ 



By the definition of fij, 

k k + l 



2i ' 2i 



n^.(^-— 



i=l 



n « 



i=l 



23 ' 2^ 

2k 2k + 1 
2j+i ' 2-''+i 



'2A; + 1 2fc + 2 
+ Pi I 2i+i ' 2^'+^ 



and 



(2.2a) 



(2.2b) 



(2.3a) 



k k + l 



2i 23 



i=l 



2k 2k + 1 



2i+i ' 2J+1 

2A; + 1 2A; + 2 



i=l 



23^ 



23- 



By Lemma [231 fl23all and (l2Jb]) imply that 



k k + l 
2J' 23 



< 



k k + l 
2J' 23 



for all j > m, j,m E N, k E Z. 



(2.3b) 



(2.4) 



By ( HTM and ^^M . this implies (i). 

For (ii), note that since every sequence of sub-probabihty measures contains a subse- 
quence that converges vaguely to a sub-probabihty measure (e.g. (0, Theorem 4.3.3)), there 
exists a subsequence {/ij,,(Pi, . . . , -Pn)} of {/ij(Pi, . . . , Pj)} and a sub-probability measure 
fioo{Pi,... ,Pn) SO that 

fij^{Pi, . . . , Pn) converges vaguely to /ioo(-Pi, . . . , Pn) as A; — > oo. Hence by the uniqueness of 
vague limits (i.e. convergence on intervals from different dense sets results in the same limit 
measure (0, corollary to Theorem 4.3.1, p 86)), (i) implies that 



lim fij 



a b 
2"! ' 2"* 



a b 
2"! ' 2*^ 



which proves that fij{Pi, ■ ■ ■ , Pn) converges vaguely to /ioo(-Pi, • • • , Pnj 



For (iii), note that 



lim WnjW = lim N fij{k,k + l\ 

J— >oo J^oo ^ 

k=—oo 

oo oo 

Elim iiAk, k + l]= ^^oo{k■, k + I] = ||/Xc 



J— +00 

fc = — OO /C = — OO 

where the second equality follows by the dominated convergence theorem, and the third by 
the definition of /ioo- The special case n = 1 of (iv) is immediate. □ 

Definition 2.5. Pi, . . . , P„ G P are [mutually) compatible if > for all j G N. 

Clearly every normal distribution is compatible with every probability distribution, every 
exponential distribution is compatible with every distribution with support in the positive 
reals, and every geometric distribution is compatible with every discrete distribution hav- 
ing any atoms in N. Even though Theorem 12.31 guarantees that /Uj(Pi, . . . , P„) converges 
vaguely to a sub-probability measure /ioo(Pi, • . . , Pn) and that limj_^oo \\f^j{Pi: ■ ■ ■ ,Pn\\ = 
||/^oo(-Pi5 • • • 5 Pn)\\, and compatibility implies that ^^M-|^iv.-Pn)^^ ^ probability measure for all 

7 G N, lim,-^oo S'i^'"''^\tt may not be a probability measure, as the next example shows. 

Example 2.6. Let Pi = Xlfceiv 2~'^<5fc, and P2 = XlfceN ^ Then Pi and P2 are easily 

seen to be compatible, but lim^^oo |||^''.(p^'"''p 'jy is the zero measure, since for each j G N, the 
support of the probability measure is contained in [j, 00). 

The next definition is the main definition in this paper. 

Definition 2.7. If ||^^(p|' "'p"j|| converges vaguely to a Borel probability measure Q as j 00, 
this limit Q is called the conflation of Pi, . . . , P„, written &(Pi, . . . , Pn). 

Theorem 2.8. The operation & is commutative and associative, that is, &(Pi, P2) = &(P2, Pi) 
an(i&(Pi,&(P2,P3)) = &(&(Pi,P2),P3) = 

&(Pl,P2,P3). 
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Proof. Immediate from the definition of //qo since multiplication of real numbers is commu- 
tative and associative. □ 

Example 2.9. Let Pi be a Bernoulli distribution with parameter p = ^ and P2 be Bernoulli 
with parameter ^ as in Example 12 .21 Then &(Pi,P2) 
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Example 2.10. Let Pi be A^(0, 1) and P2 be Bernoulli with parameter p = ^- Then it can 
easily be seen that ^f^f^^^'^J^^^ converges vaguely to &(Pi, P2) = 60 ( (2+e-i/2) ) ( (2+"-!%) ) ' 
that is, to the probability measure having the same atoms as the discrete measure, weighted 
according to the product of the atom masses of P2 and the magnitude of the density of Pi 
at and 1. 



3 Conflations of Discrete and of Absolutely Continu- 
ous Distributions 

In general, explicit representations of conflations are not known in closed form. For large 
natural classes of distributions, however, such as collections of discrete distributions with 
common atoms and collections of a.c. distributions with overlapping densities, explicit forms 
of the conflations are easy to obtain. The next two theorems give simple and powerful 
characterizations of conflations in those two cases. Since in practice input data can easily 
be approximated extremely closely by discrete distributions with common atoms (e.g., by 
replacing each Pj by the dyadic approximation /ij(Pj) above), or can be smoothed (e.g. 
by convolution with a f/(— e, e) or a A^(0,e^) variable), these two cases are of practical 
interest. The third conclusion in the next two theorems also yield the heuristic and useful 
interpretation of conflation described in the introduction. 

Theorem 3.1. Let Pi, . . . ,Pn be discrete with p.m.f. 's pi, ■ ■ ■ ,Pn, respectively, and common 
atoms A, where 7^ A C M. Then &(Pi, . . . , Pn) exists, and the following are equivalent: 



(ii) Q 



g = &(Pi,...,p„) 



Ej;gAn?=lPi(j/) 



(iii) Q is the conditional distribution of Xi given that Xi = X2 = ■ ■ ■ = Xn, where 
Xi, . . . , Xn are independent r.v. 's with distributions Pi, . . . , P„, respectively. 

Proof. Fix Pi, . . . , P„ and note that by definition of atom, Pi{x) > for alH = 1, . . . , n and 
all X e A. Fix ko eZ and jo e N, and let D = .^+1] . First it will be shown that 

n 

fioo{D)= llP^ix)■ (3-1) 

xeAnD i=l 

For all a; G M, j G N, let D^j denote the unique dyadic interval containing x. Note 

that Dxj \ {x} as j ^ 00 so Pi{Dxj) \ Pi{x) as j 00 for all i and all x G M. 
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This implies 

n n 

lim TT Pi{D^j) = Y\pi{x) for all xeR. (3.2) 

1=1 i=l 

Fix e > 0. Since {Pi} are discrete, there exists a finite set C M such that 

Pi{DnA^Q)<e for alH G {l,...,n}. (3.3) 
Since IHLiP'!^) ~ ^ ^'^^ ^ ^ ^'^^ (13. 3p implies 

n n n 

xe^nD i=l xgAoDD i=l xeAnA^pD i=l 

< E Pii^) < PliD n A^) < e. 

xeAnAgnD 

For each j G N, let = [JxeAo ^^^J- Then since x G D^j for all x and j, (13.31) implies 
Pi(D n Sj) < e for alH G {1, . . . , n}. Thus by definition of {/ij} and Lemma [231 

n 

/ij(D n S^) < Yl Pi{D n S^) < e" for all j G M. (3.5) 

i=l 

This implies that 

/i,(D) = fij{D n 5,) + fi,{D n 5^) (3.6) 

xeDnAo 

n 

= l[p,{D,,) + f,,iDns^) 

xeDnAo i=i 

where the second equality follows from the definitions of Sj and D^j. Since x G D^j, (13.61) 
implies 

n n 

x&DnAo 2=1 xs-DnAo i=i 

By (El, (O and ([13]), 

n 

f^j{D) ^ Y^ ]^Pj(a^) + e" + e for sufficiently large j. (3.8) 

xeAoHD i=l 

By (E2D and (M, |/i,(/^) = E.eAonD nr=iP2(^)l < ^ + e", so by (Q, 
|/ij(D) — J2xeAnDYYi=iPii^)\ < 2e + e". Since e > was arbitrary and since fij 
lioo, this implies (13.11) . Since D was arbitrary, (13.11) implies that ||/ioo(-Pi, • • • , -Rra)|| = 
SxeA nr=i ^'i(^)' which proves that &(Pi,...,P„) exists. The equivalence of (i) and (n) 
follows since &(Pi, . . . , P„) = j\lf^\ aiid since the measures of dyadic intervals D determine 
/ioo. The equivalence of (ii) and (iii) follows immediately from the definition of conditional 
probability. □ 
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Example 3.2. If Pi is binomial with parameters n = 2 and P = | and P2 is Poisson with 
parameter A = 5, then Sz{Pi, P2) is discrete with atoms only at 0, 1 and 2 — specifically, 

Rr(P p ^ — 8<5o I 40(5i _|_ 25(52 

Remark. It should be noted that if the input distributions are discrete and have no common 
atoms, then the conflation does not exist. This could happen if, for example, the underlying 
experiments were designed to estimate Avogadro's number (theoretically a 24-digit integer), 
and the results were given as exact integers. In practice, however, Avogadro's number is 
known only to seven decimal places, and if the results of the experiments were reported or 
recorded to eight or nine decimal places of accuracy, then there would almost certainly be 
common values, and the conflation would be well defined and meaningful. (Restriction to the 
desired decimal accuracy could be done by the experimenter, or afterwards, e.g. converting 
each input Pi to f^2o{.Pi) as mentioned above.) 

The analog of Theorem 13.11 for probability distributions with densities requires an addi- 
tional hypothesis on the density functions, for the simple reason that the product of a finite 
number of p.m.f.'s is always the mass function of a discrete sub-probability measure (i.e., 
is always summable), but the product of a finite number of p.d.f.'s may not be the density 
function of a finite a.c. measure (i.e., may not be integrable), as will be seen in Example 13.61 
below. 

The algebraic and Hilbert space properties of normalized products of density functions 
have been studied for special classes of a.c. distributions with p.d.f.'s with compact support 
that are bounded from above and bounded from below away from zero (jH; 0); products of 
p.m.f.'s and p.d.f.'s have been used in certain pattern- recognition problems (|8|); and the "log 
opinion poll" method for combining probability distributions (0) is an a.c. distribution with 
normalized density Yl fT^ which is similar in structure, but is idempotent since the weights 
sum to one. 

Theorem 3.3. Let Pi, P2, . . . , Pn be absolutely continuous with densities /i, . . . , satisfying 
^ ^ 1^00 nr=i fii^)d-x < 00. Then &(Pi, . . . , P„) exists and the following are equivalent: 

(i) g = &(Pi,...,p„); 

(ii) Q is absolutely continuous with density f{x) = j^^f["'^'f''[y)dy > 

(iii) Q is the (vague) limit, as e \ 0, of the conditional distribution of Xi given that 
\Xi — Xj\ < e for all i,j G {!,..., n}, where Xi, . . . , X„ are independent r.v. 's with 
distributions Pi, . . . , P„, respectively. 

Proof. First suppose that the densities {fi} are nonnegative simple functions on half-open 
dyadic intervals (a, 6], a, 6 G : k E Z,j E N}. Without loss of generality (splitting the 
intervals if necessary), there exists jo ^ N and a finite set C N such that 

/i = ^ Cj- fc/z)fe for alH = 1, . . . , n (3.9) 

where Cj^fc > for all i, k; and are disjoint intervals (0^, + ^] , = k E K. 
Let TTfc = nr=i ^i,k for all k E K, and note that the compatibility of Pi, . . . , P„ implies that 
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"^keK ^fc > 0- ^ill ^^'^^ t)e shown that • • • , -P^) is absolutely continuous with density 

/, where 

m = rJ^^-n'^f\, = '"^^^^^'"^^ a.S. (3.10) 

Fix m G N, and let ak,s = o,k + 2Jo+"^ ■ First note since = Ci^k a.s. on Dk for each i and k, 

n 

TTfe = (2^«+-)" II P,{ak,s-i, ak,s] (3.11) 
for all s = 1, . . . , 2™; m G N; and /c G 
By (13. lip , and the definitions of {Dk} and {fij}, 

^^jo+m = JJPi(afc,,_i,afcJ(5a, ^ (3.12) 

k(zK s=l 1=1 

Since m, jo and n are fixed, and since || z2s=i s II — 2™) ( 13.12^ implies that 
ll/^io+-il 2™E 



(3.13) 



But since l^s=i « converges vaguely to the probability measure uniformly distributed on 
Dk for each k E K, and /^"^""n converges vaguely to &(Pi, . . . , Pn) as m — > oo (13.131) implies 

(I3.10p . This completes the proof that &(Pi,...,P„) exists and (i) and (ii) are equivalent 
when the densities are simple functions on dyadic intervals. For the general case, use the 
standard method to extend this result to general simple functions, and then, since densities 
are a.s. nonnegative, extend this to finite collections of densities whose product is integrable, 
via the standard argument of approximating below by simple functions, and using monotone 
convergence. 

For the equivalence of (ii) and (iii), for every e > let Pi^^ denote the conditional 
distribution of Xi given {|Xj — Xj\ < e for all i,j G {1, . . . , n}}, that is, for all Borel sets A, 

where the denominator is always strictly positive since by hypothesis 

/ nr=i fi{^)dx > 0. Clearly, Pi „ is absolutely continuous with conditional density /i^^, where 
the independence of the {X^} implies that 



P(Xi G A and \Xi - Xj\ 


< e for all i,j G {1, . . . , n}) 


P{\X,~X,\ 


< e for all i,j G {1, . . . , n}) 



h,e{x) = \ (3.14) 



9 



Next note that by the definition of derivative and integral, 



" rx+t 

lim/i(x)n(2e)-^ / mdz = \{U{x). (3.15) 

i=2 "^^-^ i=l 



Letting //^ = min{/j, M] for all M G N, and alH G {1, . . . , n}, calculate 



Jim I f n(2e)-^ j'^^ mdz\ dy (3.16) 



= limjhn^ / fi'^iy) ( (2e)-^ | | ('^^ f^\z)dz 




\i=2 '' 

/n „ n 

\{f-\y)dy= / Wmdy. 
i=i i=i 




where the first equality follows from the monotone convergence theorem, the second since 
the convergence of lim,^o / f^{y) (nr=2(2e)"^ jyT, f!^{z)dz^ dy is uniform in M, the third 
by fl3.15p and the bounded convergence theorem since the integrand is bounded by M", and 
the last by the dominated convergence theorem since by hypothesis, 

n 

Wfi{x)dx < oo. 

i=l 

Thus by fimj) . fl3J[5D . and fl3J6|) . 



lini/i.(x) 



proving the equivalence of (ii) and (iii). □ 

Example 3.4. Suppose Pi is A^(0, 1) and P2 is exponentially distributed with mean 1. Then 
&(Pi, P2) is a.c. with p.d.f. f{x) proportional to /^e~^ = e^^'^e~^^^^^ for a; > 0, which is 
simply the standard normal shifted to the left one unit, and conditioned to be nonnegative. 

Example 3.5. Suppose Pi and P2 are both standard Cauchy distributions. Then neither Pi 
nor P2 have finite means, but by Theorem l3.3l &(Pi, P2) is a.c. with density f{x) = c(l+a;^)~^ 
for some c > 0, and since x^{l + x^)~'^dx < 00, &(Pi,P2) has both finite mean and 
variance. In particular, the confiation of Cauchy distributions is not Cauchy, in contrast to 
the closure of many classical families under confiation (Theorem 17.11 below). This example 
also shows that the classes of stable and infinitely divisible distributions are not closed under 
confiation. 
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In general, the conflation of a.c. distributions, even an a.c. distribution with itself, may 
not be a.c, let alone have a density proportional to the product of the densities. 

Example 3.6. Let Pi = P2 be a.c. with p.d.f. f{x) = [Ax)^^^"^ for < a; < 1 (and zero 
elsewhere). Then fi{x)fi{x) = ^ is not integrable, and no scalar multiple is a p.d.f. How- 
ever, the conflation &(Pi,P2) does exist, and by showing that the normalized mass of fij is 
moving to the left as j — > cxo it can be seen that P2) = Sq, the Dirac delta measure at 

zero (in particular, the conflation is not even a.c). 

The characterization of the conflation of a.c. distributions as the normalized product of 
the density functions yields another characterization of conflations of a.c. distributions, an 
analog of the classical convolution theorem in Fourier analysis (j^). 

Recall that g ® h is the convolution of g and h. 

Theorem 3.7 (Convolution theorem for conflations). Let Pi, P2, . . . , Pn be compatible and 
a.c. with densities {fi} and characteristic functions {ipi}. If < J^^YYi=i fii^)d'^ < ^ 
and {ipi} are L} , then &(Pi, . . . , Pn) exists and is the unique a.c. probability distribution with 
characteristic function ips,{Pi,...,p„) = (2vr)"-i jj^^^nSf/i W'^^ ' 

Proof. The proof will be given only for the case n = 2; the general case follows easily by 
induction and Theorem 12.81 Suppose ipi cind ip2 are and < fi{x)f2{x)dx < 00. 
Then 



(V^i ®V^2)(t) = / Ms)Mt-s)ds= Ms 



■00 
00 



ds 



fi{x)e 



itx 



i)2{s)e-''^ds 



dx 



2vr/i(x)/2(x)e**^rfx = 27r^/'&(p^,P,)(t) / fi{x)f2{x)dx 

•) J —00 

where the first equality follows from the definition of convolution; the second by definition 
of '01 ; the third by Fubini's theorem since ipi and_?/'2 are absolutely integrable; the fourth by 
the inverse characteristic function theorem (e.g. (14, Theorem 6.2.3)) since ip2 is L^] and the 
last equality by Theorem 13.31 since < fi{x)f2{x)dx < 00. □ 

The next example is an application of Theorem 13.71 and shows that the conflation of two 
standard normal distributions is mean-zero normal with half the variance of the standard 
normal. An intuitive interpretation of this fact is that if the two standard normals reflect the 
results of two independent experiments, then combining these results effectively doubles the 
number of trials, thereby halving the variance of the (sample) means. Normality is always 
preserved under conflation, as will be seen in Theorem 17.11 below. 

Example 3.8. Let Pi = P2 be A^(0, 1), so ipi{t) = i)2{t) = e-^'/^. Then {ipi ® V^2)(t) = 

roc ^_(4_,)2/2g_,2/2^^ _ g_i2/4 .00 g-(s-|)^^^ _ C-^'/'J^, SO siuCC 
J— CO J —00 V ' 

/ fi{x)f2{x)dx = ^2^^ ^ ~ 2v^' Theorem 13.71 implies that &(-Pi, P2) is a.c. with 
characteristic function ■ip{t) = ^^^^ = e"*^/"^, so Sz{Pi, P2) is A^(0, |). 
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In general, the convolution of characteristic functions of discrete measures may not even 
exist. 

Example 3.9. Let P = Pi = P2 = Sq. Then it is easy to see that Sz{Pi,P2) = S^, and 
■ipp{t) = 1, so ipp^ ® ipP2 does not even exist. 

4 Minimal Loss of Shannon Information 

Replacing several distributions by a single distribution will clearly result in some loss of 
information, however that is defined. A classical measure of information in a stochastic 
setting is the Shannon Information. 

Recall that the Shannon Information Sp{A) (also called the surprisal, or self-information) 
of a probability P for the event A G B, is Sp{A) = — log2 P{A) (so the smaller the value of 
P{A), the greater the information or surprise). The information entropy, which will not be 
addressed here, is simply the expected value of the Shannon information. 

Example 4.1. If P is uniformly distributed on (0, 1), and A = (0, |)U(|, |), then P{A) = |, 
so Sp{A) = — \og2{P{A)) = 1. Thus exactly one bit of information is obtained by observing 
A, namely, that the value of the second binary digit is 0. 

Definition 4.2. The {joint) Shannon Information of Pi, P2, . . . ,Pn for the event A G B, is 

n n 

S{P,,...,P,.M) = Sp{Xi eA,...,x^eA) = Y,SpM) = - log^ J] P^i^) 

1=1 i=l 

where {Xi} are independent random variables with distributions {Pi}, respectively, and the 
loss between the Shannon Information of Q E V and Pi,...,Pn for the event A G B is 
5{P„...,P„}(A) - Sq{A) if nr=i > O, and is if QiA) = Uti P^iA) = 0. 

Note that the maximum loss is always non- negative (taking A = Q). 
The next theorem characterizes conflation as the minimizer of loss of Shannon Informa- 
tion. 

Theorem 4.3. // Pi, . . . , P„ G "P satisfies \\fi^{Pi, P2, . . . , P„)|| > 0, then 

(i) the conflation &(Pi, P2, . . . , P„) exists; 

(ii) for every Q G P, the maximum loss between the Shannon Information of Q and 
Pi, . . . ,P„ is at least log2(||/Xoo(Pi, P2, • • .,Pn)\\~^); and 

(iii) the bound in (ii) is attained if and only if Q = &(Pi, P2, • • • , Pn)- 

Proof. Fix Pi, . . . ,Pn G V, and for brevity, let /ij = fij{Pi, P2, . . . , Pn) for all j G N, and 
fJ'oo = /Woo (Pi 7 P2, • • • , Pn)- For (i), note that by Theorem 12.31 /ij converges vaguely to /Xoo, 
and limj^oo = ||/^oo|| > 0, so converges vaguely to the probability measure 

/^oo||/^oo||~^, which implies that &(Pi, P2, . . . , Pn) exists. 
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For (ii) and (iii), fix Q G P, and let & = P2, ■ ■ ■ , Pn)- It must be shown that 

5{Pi,...,P„}(A) - Sq{A) > logadl/iooir^) for some Borel A (4.1a) 

5{P„...,P„}(A) - SgiA) > logadl/iooir') for some Borel A if Q ^ & (4.1b) 

and 

S{P„...,P,,}iA) - SgiA) < log2(||/iooir') for all Borel A if Q = &. (4.1c) 

By definition of Shannon Information, and since log2(x) is increasing, fl4.lap - fl4.lcl) are equiv- 
alent to 

^^^^ > llyUooll""^ for some Borel A (4.2a) 



Q{A) 
QjA) 



> ||/ioo|| ^ for some Borel A if Q 7^ & (4.2b) 
< ll/Uooir^ for all Borel A ii Q = k. (4.2c) 



To establish fl4.2al) . fix e, ||/ioo|| ""^ > e > 0. By Theorem 12. 3[ \\f^oo\\ as j 00, so 
there exists j* G N such that 

llAij-ir' > llAiooir'-oO. (4.3) 

For each A; G Z, let = Q (^j^, , and pk = YYi=i Pi{wi ' "^^^^ that by the definition 

of 

feez 

By (14. Sp . since Q is a probability, (14. 4p implies that 1 = J2kez1k = 
J2k€zPk\\f^j*\\~^^ so there exists A;* G Z such that 

gfc* > Pfc*||yUj*ir^ > 0. (4.5) 

Hence, by (14.30 and (14. 5 p and the definition of {pk} and {qk}, 

T-in p (k^ fc*+1 1 - \\f^°^\\ \^-^) 
1 lj=l « 2J* ' 2J* J 

By Lyapounov's theorem, the range of a finite-dimensional vector measure is closed (e.g. (jol) 
or (jel. Theorem 1.1)), so since e was arbitrarily small, this proves fl4.2al) . 

To prove fl4.2bp . suppose Q 7^ &. Then there exists a c > 0, fc* G Z and j* G N, such 
that for D = ^] , k{D) > and Q{D) > k{D) + cfi^iD). Since & = this 
implies that 

^>ll/^ooir^ + c. (4.7) 

Since iij{D) /ioo(-D) as j — ^ 00 by Theorem 12. 3( ii). (14. 7p implies there exists an m G N so 
that 

"^^"^ >llMjr' + |. (4.8) 



IJ'j*+m{D) 
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Note that D = U^^j Du, where Dk = ^] and J = {k*2^, k*T^ + 1, • • • , k*T^ + 

2™ — 1}. Next, note that since < max^ ||^| for nonnegative {cifc,&fc}, there exists 
M e J such that 

<n,ax„5^;) ^ , (4.9) 



Then 



rr« Pm ^ > — ^^77^ > ll/^oo|| (4.10) 

where the first inequahty in (I4.10p follows by (14. 9p and since fij*+rn{D) = J^kej Y[i=i Pii^k), 
and the second by (14.81) . This proves fl4.2bp . Finally, suppose that Sz = Q. Since the 
class of sets {(^ , ^^-j : j G N, A; G Z} generates the Borel sigma algebra on M, and since 
Q = Sz = /ioo ll/^ooll"^, to prove fl4.2cp it is enough to show that for all j G N, all finite sets 
JcNand allD = Uej(^.^]. 

n 

^^oo{D)<l[P,{D) (4.11) 

i=l 

but since linij^oo Aij(-D) = /ioo(-D) and fij*{D) = nr=i-^«(-^)' (14.111) follows by Theo- 
rem [23](i). □ 

Corollary 4.4. If Pi, . . . ,Pn &V are discrete with common atoms A 7^ 0, then &(Pi, . . . , Pn) 
is the unique Borel probability distribution that minimizes the maximum loss of Shannon In- 
formation between single Borel probability distributions and Pi, P2, . . . , Pn- 

Proof. It is easy to check that for discrete distributions Pi, . . . , P„ with common atoms A, 
||/^oo(-Pi, • • • , Pn)\\ = SxeA nr=i -P«(^); "which by the definition of A is strictly positive. The 
conclusion then follows immediately from Theorems 13.11 and 14. 3[ □ 

Theorem 4.5. If Pi, P2, ■ ■ ■ ,Pn are a.c. with densities fi,..., satisfying 

/oo " 
JJ/i(a;)da; < 00, 
i=i 

then there are Borel probability distributions {Pij : i E {1, . . . , n},j G N} such that 

(i) for all i, Pij converges vaguely to Pi as j —>■ 00, 

(ii) &(Pij, . . . , Pn,j) is the unique minimizer of loss of Shannon Information from Pij, . . . , Pnj, 
and 

(iii) &(Pi, . . . , Pn) is the vague limit of &(Pij, . . . , Pnj) as j 00. 
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Proof. For each i E {1, . . . ,n} and j G N, let Pij = fij{Pi), and note that iij{Pi) is a discrete 
p.m. for all i and j, and by Theorem 12. 3( iv). iij{Pi) P)i vaguely as j ^ oo , which proves 
(i). Since {Pij : i E {1, . . . , n}} are compatible for all j G N, /Xj(Pi), . . . , ^j{Pn) are discrete 
with at least one common atom, so by Theorem 13.11 and Corollary 14. 4[ &(Pi_j, . . . , Pnj) = 
Sfcez nr=i -^*((^ ~ 1)2"-', fc2~-^] is the unique minimizer of the maximum loss of Shannon 
Information between single Borel p.m.'s and {Pij : i G {1, . . . , n}}, which proves (ii). Finally, 
note that for all j G N, lY^^^ Pi{{k- 1)2- ^,k2-^ = U7=i I^jik2-^i so by the definition of {/Xj}, 
/i,(Pi,...,P.) = j:k^zUtif^jik2~^)h2-., and ||/.,(Pi, . . . , P„)|| = EkezUti f^jik2'^) > 
0. Hence, by Theorem 13.31 

Efcez nr=i f^j{k2-^)5k2-^ _ /U^(Pi, . . . , P„) 



"^(-Plj, • • • , Pn,j) 



E.eznr=i/^.(^2-^') ii^,(Pi,...,p„)ii 

converges vaguely to &(Pi, . . . , Pn), proving (iii). □ 



5 Minimax Likelihood Ratio Consolidations and Pro- 
portional Consolidations 

In classical hypotheses testing, a standard technique to decide from which of n known dis- 
tributions given data actually came is to maximize the likelihood ratios, that is, the ratios 
of the p.m.f.'s or p.d.f.'s. Analogously, when the objective is not to decide from which of 
n known distributions Pi, . . . , P„ the data came, but rather to decide how best to consoli- 
date data from those input distributions into a single (output) distribution P, one natural 
criterion is to choose P so as to make the ratios of the likelihood of observing x under P 
to the likelihood of observing x under all of the (independent) distributions {Pj} as close as 
possible. This motivates the notion of minimax likelihood ratio. 

Definition 5.1. A discrete probability distribution P* E V (with p.m.f. p*) is the minimax 
likelihood ratio (MLR) consolidation of discrete distributions Pi, . . . ,Pn (with p.m.f.'s {pi}) 

pix) pix) 1 
max -==1 -— - mm -— > 

is attained hj p = p* (where 0/0 := 1). Similarly, an a.c. distribution P* E V (with p.d.f. 
/*) is the MLR consolidation of a.c. distributions Pi, . . . ,Pn (with p.d.f.'s /i, . . . , /„) if 

. r fix) . , fix) \ 

mm <^ ess sup =j — — - - ess mf = — — - > 
p.d.f.'s/ I xGR Ui=ifiix) Ui=lM^)) 

is attained by /*. 

The min-max terms in (15.11) and (15. 2p are similar to the min-max criterion for loss of 
Shannon Information (Theorem 14. 3p . whereas the others are dual max-min criteria. Just as 
conflation was shown to minimize the loss of Shannon Information, conflation will now be 
shown to also be the MLR consolidation of the given input distributions. 



mm 

Sp 



p.m.f 



15 



Theorem 5.2. If Pi, . . . , Pn & V are discrete with at least one common atom, or are a.c. 
with p.d.f. 's {fi} satisfying < J YYi=i fi{^)dx < oo, then . . . , P„) is the unique MLR 

consolidation of Pi, . . . , Pn- 

Proof. First consider the discrete case, let {pi} denote the p.m.f.'s of {Pi}, respectively, and 
let 7^ y4 C M denote the common atoms of {Pi}, i.e. A = {x e M : nr=iP«(''')} ^ 
Theorem 13.11 . . . , Pn) is discrete with p.m.f. p*{x) = ^^74^^""^ ^ . For each p.m.f. p, 

Z^y^A i li=l PiKV) 

let 

A{p) = sup ™^ — ml 



^•GM nr=i Pii^) ^'^K nr=i Pii^) ' 

Then, since p*{x) = for every x G A'^, it follows from the definition of p* (and the convention 

0/0 := 1) that A(p*) = (^XlyeA IliLiPil^)) — 1 > 0. Thus, to establish the theorem for 
Pi, . . . ,Pn discrete, it suffices to show that for all p.m.f.'s p 

> I 1 — 1, with equality if and only ii p = p*. (5.1) 

\yeA i=l / 



^yeAP(y) ^ then there exists an xq G A'^ with p{xo) > 0, so il p (^o) ~ ^ ^^"^ 
A{p) = oo, so (15.11) is trivial. On the other hand if ^yfzAPiv) = then min^-gK yI"'^^p-{x) — ^ 
which implies that A(p) > max^^gR JY'^^p ix) ~ ^ P' ^^'^ argument in the proof of 

Theorem 14.31 shows equality holds if and only ^^F^^^^^^ is constant, i.e. if and only if p = p*. 
This proves (15.11) and completes the argument when {Pi} are discrete. 

For the a.c. conclusion, fix {Pi} a.c. with p.d.f. 's satisfying 
< IULifii^)dx < 00. By Theorem EJ] &(Pi,...,Pn) is a.c. with p.d.f. f*{x) = 
rllhifl . For each p.d.f. /, let 

■^(^) • ( ■^(^) 

A(/) = ess sup — TTT ~ 



Case 1. j YYi=i fi{x)dx G (0,1], Y\^^ifi{x) > a.s. (e.g., {Pi} arbitrary normal distri- 
butions). Then since YYi=ifii^) > 0; jj" ^f{x) ~ /H" ^f iy)d.y ^ ^^i*^^ constant, so 
A(/*) = 0. Thus it suffices to show that for all / as in Case 1, 

A/(x) > with equality if and only if / = /*. (5.2) 

If / is not positive a.s., then ess inf y^J , = since YYi=ifi{^) > ^-S-; so A(/) = 
ess sup T-rJ f > 0, and the inequality in (15.21) is satisfied. On the other hand, if / > 
a.s., then A(/) = esssup^gj^ Yr'^^^f.(x) ~ essinf^^gM Yr'^^^f.(x) — 0' with equality if and only if 
T-,J^^} t ^ is constant a.s.: i.e. if and only if / = /* a.s., which completes the argument for 
Case 1. 



16 



The three other cases 



„ n n 

/ j=i i=i 

„ n n ^ 

y n ^ oo)' n ^*(^) > ° ^-^^ j ' 

{„ n n 
i=l i=l 



follow similarly. □ 

If the {Pi} are a.c. but do not satisfy the integrability condition in the hypotheses of 
Theorem I5.2[ both parts of the conclusion of Theorem 15.21 may fail: the conflation may not 
be MLR; and MLR distributions may not be unique. 



Example 5.3. Let n = 2, and Pi = P2 be as in Example 13. 6[ so the conflation &(Pi,P2) 
exists and is which is not MLR for Pi,P2 since it is not even a.c. However, every a.c. 
distribution with p.d.f. /q,(x) = for x G (0, 1) (and = otherwise), < a < |, is MLR 

for Pi,P2. To see this, recall that nr=i/«(^) ~ (4x)~^ for x G (0,1), and = otherwise. 
Thus Yr'^^^f-i^x) ~ 4x/q,(x) = 4ax" for x G (0, 1), so esssup^gj^ YT^^'fix) ~ ^^^^^ (0' l)' 
YG-ifii^) ^ °^ ^^^^^Pa;eR YC-ifi(x) = 4« < 1- Next, essinfajeR Y\{-lfi{x) = ^ since 

n"'"^/"'(aO ~ 4a;a;" for x G (0, 1). Thus A(/q,) = 1, so to show fa is MLR, requires showing 
that A(/) > 1 for all p.d.f.'s /. Fix /, and note that if essinfa;gK YY'^'^^'f {x) = > 0; then on 
(0, 1), Yr'^^j.^x) ~ '^^fi^) ^ ^ ^-S-, so f{x) > a.s., which cannot be a density since it is 
not integrable. Hence, ess inf ^^gK = 0. But ess sup pig r\J^^} , > 1, since / is a.s. 

nonnegative and HiLi /i(^) = ^ '^ot in (0, 1). Thus A(/) > 1 so is MLR. 

In the underlying problem of consolidating the independent distributions Pi, . . . , P„ into 
a single distribution Q, a criterion similar to MLR is to require that Q reflect the relative 
likelihoods of identical individual outcomes under the {Pi}- For example, if the likelihood of 
all the experiments {Pj} observing the identical outcome x is twice that of the likelihood of 
all the experiments {Pi} observing then Q{x) should also be twice as large as Qiy). This 
motivates the notion of proportional consolidation. 

Definition 5.4. For discrete Pi, . . . ,Pn E V with p.m.f.'s pi, . . . ,Pn, respectively, the dis- 
crete distribution Q G P is a proportional consolidation of Pi, . . . , P„ if its p.m.f. q satisfles 

t^ = S^ forall..,eR. 

liy) Ui=iPi{y) 

Similarly, for a.c. Pi, . . . , P„ G V with p.d.f.'s /i, . . . , /„, respectively, the a.c. distribution 
Q G P is a proportional consolidation of Pi, . . . , P„ if its p.d.f. g satisfles 

— — = I — for Lebesgue-almost-all x,y eM.. 

9Vy) \Ai=i fiiy 
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Theorem 5.5. If Pi, . . . , Pn E V are discrete with at least one common atom, or are a.c. 
with p.d.f.'s {fi} satisfying < f Y[i=i fii^)^^ < then the conflation &(Pi,...,P„) is 
the unique proportional consolidation of Pi, . . . , Pn- 

Proof. First consider the case where {Pi} are discrete, and let {pi} be the p.m.f.'s for {Pi}, re- 
spectively. By Theorem 13.11 again. &(Pi, .... P„) is discrete with p.m.f. p*(x) = ^ ^'^n'^^\ . 

for all X G M. Thus = YY^^p'ty) ' ^(-^i) ■ ■ ■ ; -Pn) is a proportional consolidation 

of Pi,...,P„. To see that &(Pi,...,P„) is the unique proportional consolidation, sup- 
pose Q 7^ &(Pi, • • • , Pn), and set q{x) = Q{x) for all x G R. Since, Q ^ &(Pi, . . . , Pn), 
it follows from Theorem 13.11 that there exist x, y G M so that n{x\ > ^niU^W ^^^^ 

q(y) < V- ^ , so 4^ > Hr^^'l^? , and Q is not a proportional consolidation of 

Pi, . . . , Pn- The case where Pi, . . . ,Pn are a.c. follows similarly, again using Theorem 13.31 in 
place of Theorem 13.11 □ 

Here, too, the conclusion for a.c. distributions may fail if the integrability hypothesis 
condition is not satisfied. 

Example 5.6. Let n = 2, and Pi = P2 be as in Example 13.51 so again YYi^i fi{x) = (4x)~^ 

for X G (0, 1), and = otherwise. This implies that []""^ = f for Lebesgue almost all 

x,y G (0, 1). But there are no p.d.f.'s / with support on (0, 1) such that = | a.s., since 

then for fixed y, f{x) = ^^^^ for almost all x G (0,1), and cx'^dx = if c = and 
= 00 if c > 0. Thus, there is no proportional consolidation of this Pi,P2 (in contrast to 
the conclusion of Example 15.31 for these same distributions, where it was seen that there are 
many MLR consolidations). 



6 Conflations of Normal Distributions 

In describing the method used to obtain values for the fundamental physical constants from 
the input data, NIST explains that certain data "are the means of tens of individual values. 



with each value being the average of about ten data points" (1l3l . p. 679), and predicates 



interpretation of some of their conclusions on the condition "If the probability distribution 



associated with each input datum is assumed to be normal" (I 111 , p. 483). After comparing 
the most recent (2006) results from electrical watt-balance and from silicon-lattice sphere 
experiments used to estimate Planck's constant, however, NIST determined that the means 
and standard deviations of several distributions of input data were not sufficiently close, 
and reported that their "data analysis uncovered two major inconsistencies with the input 
data," conceding that the resulting official NIST 2006 set of recommended values for the 
fundamental physical constants "does not rest on as solid a foundation as one might wish" 



( I12I . p. 54). In order to ehminate this perceived inconsistency, the NIST task group "ulti- 
mately decided that . . . the a priori assigned uncertainties of the input data involved in the 
two discrepancies would be weighted by the multiplicative factor 1.5," which "reduced the 



discrepancies to a level comfortably between two standard deviations" (Il2l . p. 54) 
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But if the various input distributions are all normal, for example, as in the NIST assump- 
tion, then every interval centered at the unknown positive true value of Planck's constant 
has a positive probability of occurring in every one of the independent experiments. If the 
input data distributions happen to have different means and variances, that does not imply 
the input is "inconsistent." Thus in consolidating data from several independent sources, 
special attention should be paid to the normal case. 

The conflation of normal distributions has several important properties - it is itself normal 
(hence unimodal), and in addition to minimizing the loss of Shannon Information (Theo- 
rem l4.3l) and being the unique MLR consolidation (Theorem l5.2p and the unique proportional 
consolidation (Theorem 15.51) . the conflation of normal distributions also yields the classical 
weighted mean squares and best linear unbiased estimators for general unbiased data, and 
maximum likelihood estimators for normally-distributed unbiased input data. 

Theorem 6.1. If is N{ni,af), i = l,...,n, then 

(Yl^ 1 1 \ 

Proof. By Theorem 13. 3[ &(Pi, . . . , Pn) is a.c. with density proportional to the product of 
the densities for each distribution, and the conclusion then follows immediately from the 
definition of normal densities and a routine calculation by completing the square. □ 

Example 6.2. If Pi is A^(l, 1) and P2 is A^(2,4), then &(Pi,P2) is A^(|, f). 

The mean of the conflations of normals given in Theorem 16. 
Sr=i A^i^j"^ (^iLi'^i"^)' is precisely the value of the weighted least squares estimate given 
by Aitken's generalization of the Gauss-Markov Theorem, and this simple observation will 
next be exploited to obtain several conclusions relating conflation and statistical estimators. 

First, however, it must be remarked that the mean of the conflation is not in general the 
same as the weighted least squares estimate. Conflation disregards outlier or "inconsistent" 
data values, whereas weighted least squares gives full weight to all values. For instance, if 
one of the input distributions includes negative entries (e.g., is reported as a true Gaussian), 
and the others do not, then conflation eliminates the negative values. The following example 
for the uniform distribution illustrates this, and the same argument can easily be applied to 
other distributions such as truncated normals (Theorem 17.21 below). 

Example 6.3. Let Pi be [7(0, 1) and P2 be t/(-0.1, 1). By Theorem [331 the conflation of 
Pi and P2 is &(Pi, P2) = U (0, 1), which ignores the negative values of P2 and has mean |. The 

weighted least squares estimate, however, is easily seen to be ( y + 1^) ^ (t + (a)) (r^)) ^ 
.48. 

To establish the link between conflation and statistical estimators, recall that a random 
variable X is an unbiased estimator of an unknown parameter 9 if EX = 6, and note that if 
X is a r.v., then N{X, a^) is a random normal distribution with variable mean X and fixed 
variance a^. 
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Theorem 6.4. If Xi, . . . ,X„ are independent unbiased estimators of 6 with finite variances 
af,..., a^, respectively, then 6 = mean(&(A^i, . . . , Nn)) is the best linear unbiased estimator 
for 6, where {Ni} are the random normal distributions Ni = N{Xi, af), i = 1, . . . ,n. 



Proof. By Theorem I6.H &(A'^i, . . . , Nn) is 

^ (SiLi (ZliLi ^ , (ZliLi ^) , where {yuj and {erf} are the means and vari- 

ances of {Ni}, respectively. Since Ni is N{Xi, af) for each i = 1, . . . ,n, where the {Xi} are 
r.v.'s, this imphes that &(A^i, . . . , A^„) is the random distribution 

N (ELi X.a-^ (ELi , (ELi ^r^) , so 

^a-M ^X,ar2. (6.1) 

i=l / i=l 

Since the right hand side of (16. ip is the classical weighted least squares estimator for 9, 
Aitken's generalization of the Gauss-Markov Theorem (e.g. ([ll), (14, Theorem 7.8a)) implies 
that it is the best linear unbiased estimator for 6. □ 

Note that normality of the distributions is in the conclusion, not the hypotheses, of 
Theorem 16.41 If, in addition, the underlying data distributions are normal, this estimator is 
even a maximum likelihood estimator. 

Theorem 6.5. If Xi, . . . ,X„ are independent normally- distributed unbiased estimators of 9 
with finite variances af,..., respectively, then 6 = mean{Sz{Ni, . . . , Nn)) is a maximum 
likelihood estimator for 9, where {Ni} are the random normal distributions Ni = N{Xi,af), 
i = 1 n. 



Proof. Analogous to proof of Theorem 16. 4[ using (Il4j . Theorem 7.8b). □ 



7 Closure and Truncation Properties of Conflation 

If input data distributions are of a particular form, it is often desirable that consolidation 
of the input also have that same form. Theorem 6.1 showed that the conflation of normal 
distributions is always normal, and the next theorem shows that many other classical families 
of distributions are closed under conflation. 

Recall that: a discrete probability distribution is Bernoulli with parameter p E [0, 1] 
if its p.m.f. is p{l) = 1 — p{0) = p, is geometric with parameter p G [0, 1] if its p.m.f. is 
p{k) = (1 —p)^^^p for all /c e N, is discrete uniform on {1,2,..., n} if its p.m.f. is p{k) = 
for all A; G {1,2,..., n}, is Zipf with parameters a > and n G N if its p.m.f. is proportional 
to k~°' for all A; G {1, 2, ... , n}, and is Zeta with parameter a > 1 if its p.m.f. is proportional 
to for all /c G N; and an a.c. probability distribution is gamma with parameters a G N 
and /3 > if its p.d.f. is proportional to 3;"~^e~'^'/^ for x > 0, is beta with parameters a > 1 
and /9 > 1 if its p.d.f. is proportional to x"~^(l — x)^^^ for < x < 1, is uniform on {a,b) 
for a < 6 if its p.d.f. is constant {b — a)^^ for a < x < b, is standard LaPlace (or double- 
exponential) with parameter a > if its p.d.f. is proportional to e~'^'/^, — oo < x < oo, is 
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Pareto with parameters a > and /3 > if its p.d.f. is proportional to x ("+^) for (3 < x < oo, 
and is exponential with mean a > if its p.d.f. is proportional to e~^/" for x > 0. 

Theorem 7.1. Let Pi, P2, . . . ,Pn be compatible. 

If {Pi} are Bernoulli with parameters {pi} respectively, then 

&(Pi, . . . , Pn) is Bernoulli with parameter p = j— — -y. 

If {Pi} are geometric with parameters {pi} respectively, 
then . . . , Pn) is geometric with parameter p = 1 — nr=i(-'- ~ Pi)- 

If {Pi} are discrete uniform on {1, ... , Ui} respectively, 
then . . . , Pn) is uniform on {1, ... , minjjnj}}. 

If {Pi} are Zipf with parameters {ai} and {ui}, respectively, then &(Pi, . . . , Pn) is Zipf 
with parameters a = XlILi ^'^^ ^ ~ niinj{r;,j}. 

If {Pi} are Zeta with parameters {ai} respectively, then &(Pi, . . . , Pn) is Zeta with 
parameter a = XlILi '^i- 

If {Pi} are gamma with parameters {ai, Pi} respectively, 

then &(Pi, . . . , Pn) is gamma with parameters a = Yl^=i = (XliLilA)"^) ^ ■ 

If {Pi} are beta with parameters {ai, Pi} respectively, then &(Pi, . . . , Pn) is beta with 
parameters a = Yl'i=i on - in - I), [3 = X]r=i Pi - (n-l). 

If {Pi} are continuous uniform on intervals {{ai, hi)} respectively, then &(Pi, . . . , Pn) 
is uniform on (maxj a,, miuj hi) . 

If {Pi} are LaPlace with parameters {ai} respectively, then 

&(Pi, . . . , Pn) is LaPlace with parameter a = {J2^=ii'^i)~^) ^ ■ 

If {Pi} are Pareto with parameters {ai. Pi} respectively, then 

&(Pi, . . . , Pn) is Pareto with parameters a = XliLi c^i + n — 1 and P = maXj Pi. 

If {Pi} are exponential with means {ai} respectively, then &(Pi, . . . , Pn) is exponential 
with mean a = (^"=10^^) ^■ 

Proof. Conclusions (i)-(v) follow from Theorem 13.11 and routine calculations, and (vi)-(xi) 
follow from Theorem 13.31 and calculations. □ 



21 



Note that for smaller values of the parameters of beta distributions, the conflation may 
not be beta simply because the product of the densities may not be integrable. The families 
of distributions identified in Theorem 17.11 that are closed under conflation are by no means 
exhaustive. For example, the conflation of n Poisson distributions is not classical Poisson, 
but is a discrete Conway-Maxwell-Poisson (CMP) distribution with p.m.f. proportional to 
j^yy^, = 0, 1, . . . and clearly the CMP family is closed under conflation. 

Recall that the conflation of Cauchy distributions is not Cauchy, as was shown in Ex- 
ample 13. 5[ It is easy to see that the families of binomial distributions and of chi-square 
distributions are not closed under conflation, but chi-square comes very close in the following 
sense: if X is a random variable with distribution &(Pi, . . . , Pn) where {Pj} are chi-square 
with {ki\ degrees of freedom, respectively, then X/n is chi-square with XlILi ki — 2n + 2 
degrees of freedom. 

In practice, assumptions are often made about the form of the input distributions, such 
as NIST's essential assumption that underlying data is often normally distributed. But the 
true and estimated values for Planck's constant clearly are never negative, so the underlying 
distribution is certainly not truly normally distributed - more likely it is truncated normal. 
The additional assumption of exact normality, in addition to their use of linearizing the 



observational equations and then applying generalized least squares (jlll . p. 481), introduces 
further errors into the NIST estimates. 

Using conflations, however, the problem of truncation essentially disappears - it is auto- 
matically taken into account. The reason is that another important feature of conflations is 
that it preserves many classes of truncated distributions, where a distribution of a certain 
type is called truncated if it is the conditional distribution of that type conditioned to be in 
a (finite or infinite) interval. For example, truncated normal distributions include normal 
distributions conditioned to be positive (that is, a.c. distributions with density function pro- 
portional to e"*-^'"''^ /^'^ , a; > (and zero elsewhere)), as is often the case in experimental 
data involving estimates of many of the fundamental physical constants. 

Theorem 7.2. //Pi, P2, . . . , P„ are compatible truncated normal (exponential, gamma, LaPlace, 
Pareto) distributions, then &(Pi, P2, . . . , Pn) is also a truncated normal (exponential, gamma, 
LaPlace, Pareto, respectively) distribution. 

Proof. Immediate from Theorem 13.31 □ 

The above example of determination of the values of the fundamental physical constants 
is only one among many scientific situations where consolidation of dissimilar data is prob- 
lematic. Some government agencies, such as the Methods and Data Comparability Board of 
the National Water Quality Monitoring Council ([loi ). have even established special programs 
to address this issue. Perhaps the method of conflating input data will provide a practical 
and simple, yet optimal and rigorous method to address this problem. 
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